low-rank decomposition
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
Self-supervised Graph Neural Networks via Low-Rank Decomposition
Self-supervised learning is introduced to train graph neural networks (GNNs) by employing propagation-based GNNs designed for semi-supervised learning tasks. Unfortunately, this common choice tends to cause two serious issues. Firstly, global parameters cause the model lack the ability to capture the local property. Secondly, it is difficult to handle networks beyond homophily without label information.This paper tends to break through the common choice of employing propagation-based GNNs, which aggregate representations of nodes belonging to different classes and tend to lose discriminative information. If the propagation in each ego-network is just between the nodes from the same class, the obtained representation matrix should follow the low-rank characteristic. To meet this requirement, this paper proposes the Low-Rank Decomposition-based GNNs (LRD-GNN-Matrix) by employing Low-Rank Decomposition to the attribute matrix. Furthermore, to incorporate long-distance information, Low-Rank Tensor Decomposition-based GNN (LRD-GNN-Tensor) is proposed by constructing the node attribute tensor from selected similar ego-networks and performing Low-Rank Tensor Decomposition. The employed tensor nuclear norm facilitates the capture of the long-distance relationship between original and selected similar ego-networks. Extensive experiments demonstrate the superior performance and the robustness of LRD-GNNs.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of Activations
Tahmasebi, Faraz, Pelluer, Michael, Kwon, Hyoukjun
The computation and memory costs of large language models kept increasing over last decade, which reached over the scale of 1T parameters. To address the challenges from the large scale models, model compression techniques such as low-rank decomposition have been explored. Previous model decomposition works have focused on weight decomposition to avoid costly runtime decomposition, whose latency often significantly exceeds the benefits from decomposition (e.g., 38% more end-to-end latency when running Llama2-7b on A100 with 4K sequence length with activation decomposition compared to no decomposition). In this work, we debunk such observations and report that the input decomposition can be significantly beneficial with a proper choice of decomposition algorithm and hardware support. We adopt progressive decomposition algorithm, Lanczos algorithm, and design a co-accelerator architecture for the decomposition algorithm. To address the memory- boundness of the decomposition operation, we introduce a novel compute replication methodology that moves the op- eration toward compute-bound region, which enables 6.2x speedup in our evaluation. We also develop an output shape- preserving computation scheme that eliminates decomposi- tion costs in consecutive layers. To compensate model quality loss from compression, we introduce a multi-track decom- position approach that separately handles outlier channels for high accuracy and low perplexity with minimal compu- tational costs. Combined together, our accelerator, D-com, provides 22% end-to-end latency improvements compared to A100 GPU at the cost of small model quality degradation (e.g., 3% on AI2 Reasoning Challenge task).
- North America > United States > California > Orange County > Irvine (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Vanishing Contributions: A Unified Approach to Smoothly Transition Neural Models into Compressed Form
Nikiforos, Lorenzo, Antoniadis, Charalampos, Prono, Luciano, Pareschi, Fabio, Rovatti, Riccardo, Setti, Gianluca
Abstract--The increasing scale of deep neural networks has led to a growing need for compression techniques such as pruning, quantization, and low-rank decomposition. While these methods are very effective in reducing memory, computation and energy consumption, they often introduce severe accuracy degradation when applied directly. We introduce V anishing Contributions (VCON), a general approach for smoothly transitioning neural models into compressed form. Rather than replacing the original network directly with its compressed version, VCON executes the two in parallel during fine-tuning. The contribution of the original (uncompressed) model is progressively reduced, while that of the compressed model is gradually increased. This smooth transition allows the network to adapt over time, improving stability and mitigating accuracy degradation. We evaluate VCON across computer vision and natural language processing benchmarks, in combination with multiple compression strategies. Across all scenarios, VCON leads to consistent improvements: typical gains exceed 3%, while some configuration exhibits accuracy boosts of 20%. VCON thus provides a generalizable method that can be applied to the existing compression techniques, with evidence of consistent gains across multiple benchmarks. EEP NEURAL NETWORKS (DNNs) have shown significant capabilities in solving a wide array of complex tasks across multiple domains. These tasks include, but are not limited to, computer vision, natural language processing, speech recognition, medical image analysis, autonomous driving, and even drug discovery [1], [2].
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
Sy, Yaya, Cerisara, Christophe, Illina, Irina
Pruning large pre-trained transformers for low-resource languages is challenging, as it often requires massive retraining data to recover performance. For instance, Distill-Whisper prunes Whisper by 40% and retrains on 21,000 hours of speech, far beyond what is available for most languages. Can Whisper be made lighter and faster for edge devices in data-scarce settings? Focusing on Bambara with only 32h of speech-to-text data, we propose a new pruning recipe. Instead of vocabulary pruning, which is unsuitable due to frequent code-switching by Bambara speakers, we compress the embeddings with low-rank decomposition and feature distillation. Rather than removing layers, we merge them to limit performance loss. The final model preserves 90% of the original performance while being 48% smaller and 2.15x faster on a MacBook Air M1.
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.40)
- Africa > Mali (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning
Li, Shiwei, Luo, Xiandi, Wang, Haozhao, Tang, Xing, Xu, Shijie, Luo, Weihong, Li, Yuhua, He, Xiuqiang, Li, Ruixuan
To improve the training efficiency of federated learning (FL), previous research has employed low-rank decomposition techniques to reduce communication overhead. In this paper, we seek to enhance the performance of these low-rank decomposition methods. Specifically, we focus on three key issues related to decomposition in FL: what to decompose, how to decompose, and how to aggregate. Subsequently, we introduce three novel techniques: Model Update Decomposition (MUD), Block-wise Kronecker Decomposition (BKD), and Aggregation-Aware Decomposition (AAD), each targeting a specific issue. These techniques are complementary and can be applied simultaneously to achieve optimal performance. Additionally, we provide a rigorous theoretical analysis to ensure the convergence of the proposed MUD. Extensive experimental results show that our approach achieves faster convergence and superior accuracy compared to relevant baseline methods. The code is available at https://github.com/Leopold1423/fedmud-icml25.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)